davies_bouldin_score (Davies–Bouldin Index, DBI)#
The Davies–Bouldin index (DBI) is an internal clustering metric: it scores a clustering using only the data \(X\) and the predicted cluster labels.
Intuition: good clusterings have compact clusters that are far from each other. DBI summarizes this as an average of each cluster’s worst-case similarity to another cluster.
Lower is better.
Quick import#
from sklearn.metrics import davies_bouldin_score
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import davies_bouldin_score as sk_davies_bouldin_score
from sklearn.preprocessing import StandardScaler
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
1) When DBI is a good choice#
Use DBI when you:
have no ground-truth labels (unsupervised setting)
want a fast sanity check / model-selection criterion for a clustering
expect clusters to be reasonably blob-like / convex (centroid-based geometry)
DBI is commonly used to compare settings like the number of clusters \(k\) in \(k\)-means.
2) A quick visual example#
DBI needs predicted cluster labels. Below we fit \(k\)-means on a 2D toy dataset and compute DBI.
# 2D toy dataset (3 blobs)
X, _ = make_blobs(
n_samples=600,
centers=[(-4, -1), (0, 3), (4, -1)],
cluster_std=[0.7, 1.0, 0.8],
random_state=42,
)
kmeans = KMeans(n_clusters=3, n_init=10, random_state=42)
labels = kmeans.fit_predict(X)
dbi = sk_davies_bouldin_score(X, labels)
df = pd.DataFrame({"x1": X[:, 0], "x2": X[:, 1], "cluster": labels.astype(str)})
fig = px.scatter(
df,
x="x1",
y="x2",
color="cluster",
title=f"K-means clustering (k=3) — DBI={dbi:.3f} (lower is better)",
)
fig.add_trace(
go.Scatter(
x=kmeans.cluster_centers_[:, 0],
y=kmeans.cluster_centers_[:, 1],
mode="markers",
marker=dict(symbol="x", size=12, color="black"),
name="centroid",
)
)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.show()
dbi
0.38664615200057156
3) Definition (math)#
Assume data points \(x_n \in \mathbb{R}^d\) and a clustering into \(k\) non-empty clusters \(C_1,\dots,C_k\).
Centroid of cluster \(i\): $\(\mu_i = \frac{1}{|C_i|}\sum_{x \in C_i} x.\)$
Within-cluster scatter (as used by scikit-learn): $\(S_i = \frac{1}{|C_i|}\sum_{x \in C_i} \lVert x - \mu_i \rVert_2.\)$
Between-cluster separation (centroid distance): $\(M_{ij} = \lVert \mu_i - \mu_j \rVert_2.\)$
Pairwise “similarity” (high means bad: large scatter and/or small separation): $\(R_{ij} = \frac{S_i + S_j}{M_{ij}}, \quad i \ne j.\)$
For each cluster, take the worst-case neighbor: $\(D_i = \max_{j \ne i} R_{ij}.\)$
Finally, DBI is the average: $\(\mathrm{DBI} = \frac{1}{k}\sum_{i=1}^k D_i.\)$
A perfect value is \(0\) (zero scatter with distinct centroids). In practice, lower is better.
4) A low-level NumPy implementation#
Below is a from-scratch implementation that matches sklearn.metrics.davies_bouldin_score for Euclidean distances.
def davies_bouldin_index_np(X, labels):
"""Compute the Davies–Bouldin index (DBI) from scratch.
Parameters
----------
X : array-like, shape (n_samples, n_features)
Data matrix.
labels : array-like, shape (n_samples,)
Cluster labels.
Returns
-------
float
Davies–Bouldin index. Lower is better.
"""
X = np.asarray(X, dtype=float)
labels = np.asarray(labels)
if X.ndim != 2:
raise ValueError("X must be 2D of shape (n_samples, n_features).")
if labels.ndim != 1 or labels.shape[0] != X.shape[0]:
raise ValueError("labels must be 1D with length n_samples.")
_, inv = np.unique(labels, return_inverse=True)
k = int(inv.max() + 1)
if k < 2:
raise ValueError("Davies–Bouldin index requires at least 2 clusters.")
counts = np.bincount(inv, minlength=k)
if np.any(counts == 0):
raise ValueError("Empty cluster detected.")
# Centroids: mu_i
centroids = np.zeros((k, X.shape[1]), dtype=float)
np.add.at(centroids, inv, X)
centroids /= counts[:, None]
# Scatter: S_i = mean(||x - mu_i||)
dists = np.linalg.norm(X - centroids[inv], axis=1)
scatters = np.bincount(inv, weights=dists, minlength=k) / counts
# Separation: M_ij = ||mu_i - mu_j||
diff = centroids[:, None, :] - centroids[None, :, :]
centroid_dist = np.linalg.norm(diff, axis=2)
# R_ij = (S_i + S_j) / M_ij
with np.errstate(divide="ignore", invalid="ignore"):
R = (scatters[:, None] + scatters[None, :]) / centroid_dist
np.fill_diagonal(R, -np.inf)
D = np.max(R, axis=1) # worst neighbor for each cluster
return float(np.mean(D))
# Sanity check: NumPy vs scikit-learn
dbi_np = davies_bouldin_index_np(X, labels)
dbi_sk = sk_davies_bouldin_score(X, labels)
dbi_np, dbi_sk, abs(dbi_np - dbi_sk)
(0.3866461520005713, 0.38664615200057156, 2.7755575615628914e-16)
# DBI is lower when clusters are compact and well-separated (try random labels)
labels_random = rng.integers(0, 3, size=X.shape[0])
dbi_random = davies_bouldin_index_np(X, labels_random)
dbi_np, dbi_random
(0.3866461520005713, 26.361322636978866)
5) Visualizing the ingredients (\(S_i\), \(M_{ij}\), and the “worst neighbor”)#
DBI is easiest to understand by looking at its building blocks:
\(S_i\): how spread-out cluster \(i\) is around its centroid (average radius)
\(M_{ij}\): distance between centroids (how far two clusters are)
\(R_{ij} = (S_i + S_j)/M_{ij}\): “overlap risk” between clusters \(i\) and \(j\)
\(D_i = \max_{j\ne i} R_{ij}\): cluster \(i\) only cares about its most confusing neighbor
DBI averages the \(D_i\) values.
def dbi_components_np(X, labels):
"""Return DBI components for inspection/plotting."""
X = np.asarray(X, dtype=float)
labels = np.asarray(labels)
unique_labels, inv = np.unique(labels, return_inverse=True)
k = unique_labels.size
if k < 2:
raise ValueError("Need at least 2 non-empty clusters.")
counts = np.bincount(inv, minlength=k)
if np.any(counts == 0):
raise ValueError("Empty cluster detected.")
centroids = np.zeros((k, X.shape[1]), dtype=float)
np.add.at(centroids, inv, X)
centroids /= counts[:, None]
dists = np.linalg.norm(X - centroids[inv], axis=1)
S = np.bincount(inv, weights=dists, minlength=k) / counts
diff = centroids[:, None, :] - centroids[None, :, :]
M = np.linalg.norm(diff, axis=2)
with np.errstate(divide="ignore", invalid="ignore"):
R = (S[:, None] + S[None, :]) / M
np.fill_diagonal(R, -np.inf)
worst_neighbor = np.argmax(R, axis=1)
D = R[np.arange(k), worst_neighbor]
dbi = float(D.mean())
return {
"unique_labels": unique_labels,
"centroids": centroids,
"S": S,
"M": M,
"R": R,
"worst_neighbor": worst_neighbor,
"D": D,
"dbi": dbi,
}
comp = dbi_components_np(X, labels)
centroids = comp["centroids"]
S = comp["S"]
worst = comp["worst_neighbor"]
D = comp["D"]
dbi = comp["dbi"]
# Points + centroids + (approx) scatter circles + arrows to worst neighbor
fig = px.scatter(
df,
x="x1",
y="x2",
color="cluster",
title=f"DBI ingredients — DBI={dbi:.3f}",
)
fig.add_trace(
go.Scatter(
x=centroids[:, 0],
y=centroids[:, 1],
mode="markers",
marker=dict(symbol="x", size=12, color="black"),
name="centroid",
)
)
shapes = []
for i, (cx, cy) in enumerate(centroids[:, :2]):
r = float(S[i])
shapes.append(
dict(
type="circle",
xref="x",
yref="y",
x0=cx - r,
x1=cx + r,
y0=cy - r,
y1=cy + r,
line=dict(color="rgba(0,0,0,0.35)", dash="dot"),
)
)
fig.update_layout(shapes=shapes)
for i, j in enumerate(worst):
fig.add_trace(
go.Scatter(
x=[centroids[i, 0], centroids[j, 0]],
y=[centroids[i, 1], centroids[j, 1]],
mode="lines",
line=dict(color="rgba(0,0,0,0.5)", dash="dash"),
showlegend=False,
hoverinfo="skip",
)
)
fig.add_annotation(
x=centroids[i, 0],
y=centroids[i, 1],
text=f"S={S[i]:.2f}<br>D={D[i]:.2f}",
showarrow=True,
arrowhead=2,
ax=25,
ay=-25,
)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.show()
# For small k, R_ij is nice to inspect directly
R_show = comp["R"].copy()
np.fill_diagonal(R_show, np.nan)
fig2 = px.imshow(
R_show,
text_auto=".2f",
aspect="auto",
title="Pairwise ratios R_ij = (S_i+S_j)/M_ij (diagonal undefined)",
)
fig2.update_layout(xaxis_title="j", yaxis_title="i")
fig2.show()
6) How DBI reacts to separation and dispersion#
DBI decreases when you pull clusters apart (larger \(M_{ij}\)), and increases when you make clusters more spread-out (larger \(S_i\)).
To isolate the effect, we keep true labels fixed and only change the geometry.
# Build two Gaussian blobs from shared base noise to reduce Monte Carlo noise
rng_demo = np.random.default_rng(123)
n1, n2 = 500, 500
Z1 = rng_demo.standard_normal((n1, 2))
Z2 = rng_demo.standard_normal((n2, 2))
labels_true_2 = np.r_[np.zeros(n1, dtype=int), np.ones(n2, dtype=int)]
def make_two_blobs_np(separation, std):
c1 = np.array([-separation / 2.0, 0.0])
c2 = np.array([separation / 2.0, 0.0])
X1 = Z1 * std + c1
X2 = Z2 * std + c2
return np.vstack([X1, X2])
separations = np.linspace(0.5, 8.0, 16)
rows = []
for sep in separations:
X_sep = make_two_blobs_np(sep, std=1.0)
rows.append({"separation": float(sep), "dbi": davies_bouldin_index_np(X_sep, labels_true_2)})
df_sep = pd.DataFrame(rows)
fig = px.line(
df_sep,
x="separation",
y="dbi",
markers=True,
title="DBI decreases as clusters are pulled apart (spread fixed)",
)
fig.update_layout(xaxis_title="Separation", yaxis_title="Davies–Bouldin index")
fig.show()
stds = np.linspace(0.3, 3.0, 16)
rows = []
for std in stds:
X_std = make_two_blobs_np(separation=5.0, std=float(std))
rows.append({"std": float(std), "dbi": davies_bouldin_index_np(X_std, labels_true_2)})
df_std = pd.DataFrame(rows)
fig = px.line(
df_std,
x="std",
y="dbi",
markers=True,
title="DBI increases as clusters get more diffuse (separation fixed)",
)
fig.update_layout(xaxis_title="Cluster standard deviation", yaxis_title="Davies–Bouldin index")
fig.show()
7) Common pitfall: DBI is scale-dependent#
DBI is based on Euclidean distances, so units matter. If one feature has a much larger scale than the others, it can dominate both the clustering and the DBI.
Rule of thumb: when using DBI with distance-based clustering, standardize features (e.g., z-score) unless you have a strong reason not to.
X0, _ = make_blobs(
n_samples=600,
centers=[(-3, 0), (0, 4), (3, 0)],
cluster_std=[0.8, 0.8, 0.8],
random_state=0,
)
# Make x1 much larger-scale than x2
X_bad = X0.copy()
X_bad[:, 0] *= 12.0
k_bad = KMeans(n_clusters=3, n_init=10, random_state=0).fit(X_bad)
labels_bad = k_bad.labels_
dbi_bad = sk_davies_bouldin_score(X_bad, labels_bad)
scaler = StandardScaler()
X_bad_std = scaler.fit_transform(X_bad)
k_std = KMeans(n_clusters=3, n_init=10, random_state=0).fit(X_bad_std)
labels_std = k_std.labels_
dbi_std = sk_davies_bouldin_score(X_bad_std, labels_std)
# Plot both clusterings in the original (scaled) coordinates
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(
f"Raw scaled features (DBI={dbi_bad:.3f})",
f"After StandardScaler (DBI={dbi_std:.3f})",
),
)
for cl in np.unique(labels_bad):
m = labels_bad == cl
fig.add_trace(
go.Scatter(x=X_bad[m, 0], y=X_bad[m, 1], mode="markers", marker=dict(size=4), name=f"cluster {cl}"),
row=1,
col=1,
)
for cl in np.unique(labels_std):
m = labels_std == cl
fig.add_trace(
go.Scatter(
x=X_bad[m, 0],
y=X_bad[m, 1],
mode="markers",
marker=dict(size=4),
name=f"cluster {cl}",
showlegend=False,
),
row=1,
col=2,
)
fig.add_trace(
go.Scatter(
x=k_bad.cluster_centers_[:, 0],
y=k_bad.cluster_centers_[:, 1],
mode="markers",
marker=dict(symbol="x", size=10, color="black"),
name="centroid",
),
row=1,
col=1,
)
# Centroids are in standardized space; invert for plotting
centroids_std_original = scaler.inverse_transform(k_std.cluster_centers_)
fig.add_trace(
go.Scatter(
x=centroids_std_original[:, 0],
y=centroids_std_original[:, 1],
mode="markers",
marker=dict(symbol="x", size=10, color="black"),
name="centroid",
showlegend=False,
),
row=1,
col=2,
)
fig.update_xaxes(title_text="x1 (scaled units)", row=1, col=1)
fig.update_xaxes(title_text="x1 (scaled units)", row=1, col=2)
fig.update_yaxes(title_text="x2", row=1, col=1)
fig.update_yaxes(title_text="x2", row=1, col=2)
fig.update_layout(title="Pitfall: DBI depends on feature scaling")
fig.show()
dbi_bad, dbi_std
(0.3978227444815982, 0.3866099752111638)
8) Using DBI to optimize a simple algorithm (from scratch \(k\)-means)#
DBI is a clustering metric: it needs cluster assignments, so it’s not a natural objective for supervised models like linear/logistic regression.
A common “optimization” use is model selection: pick the hyperparameter (often \(k\)) that minimizes DBI.
Below is a low-level \(k\)-means implementation (NumPy) and a DBI-based search over \(k\).
def kmeans_pp_init_np(X, k, rng):
"""k-means++ initialization (NumPy).
Returns centroids of shape (k, n_features).
"""
X = np.asarray(X, dtype=float)
n = X.shape[0]
centroids = np.empty((k, X.shape[1]), dtype=float)
# First centroid uniformly at random
i0 = int(rng.integers(0, n))
centroids[0] = X[i0]
# Dist^2 to closest centroid so far
closest_dist2 = np.sum((X - centroids[0]) ** 2, axis=1)
for c in range(1, k):
total = float(closest_dist2.sum())
if total == 0.0:
# All points are identical.
centroids[c:] = X[i0]
break
probs = closest_dist2 / total
idx = int(rng.choice(n, p=probs))
centroids[c] = X[idx]
dist2_new = np.sum((X - centroids[c]) ** 2, axis=1)
closest_dist2 = np.minimum(closest_dist2, dist2_new)
return centroids
def kmeans_np(X, k, n_init=10, max_iter=200, tol=1e-6, rng=None):
"""Simple k-means (Lloyd) with k-means++ init.
Returns
-------
labels : (n_samples,)
centroids : (k, n_features)
inertia : float
"""
X = np.asarray(X, dtype=float)
if rng is None:
rng = np.random.default_rng(0)
n, d = X.shape
best_inertia = np.inf
best_labels = None
best_centroids = None
for _ in range(int(n_init)):
centroids = kmeans_pp_init_np(X, k, rng)
for _it in range(int(max_iter)):
# Assign to nearest centroid (squared Euclidean)
dist2 = np.sum((X[:, None, :] - centroids[None, :, :]) ** 2, axis=2)
labels = np.argmin(dist2, axis=1)
counts = np.bincount(labels, minlength=k)
if np.any(counts == 0):
# Rare but possible: re-seed empty centroid to a random data point
empty = np.where(counts == 0)[0]
centroids[empty] = X[rng.integers(0, n, size=empty.size)]
continue
new_centroids = np.zeros((k, d), dtype=float)
np.add.at(new_centroids, labels, X)
new_centroids /= counts[:, None]
shift = float(np.linalg.norm(new_centroids - centroids))
centroids = new_centroids
if shift <= tol:
break
# Final inertia
dist2 = np.sum((X[:, None, :] - centroids[None, :, :]) ** 2, axis=2)
labels = np.argmin(dist2, axis=1)
inertia = float(dist2[np.arange(n), labels].sum())
if inertia < best_inertia:
best_inertia = inertia
best_labels = labels
best_centroids = centroids
return best_labels, best_centroids, best_inertia
# Dataset with a "true" k (4 blobs)
Xk, _ = make_blobs(n_samples=900, centers=4, cluster_std=[0.6, 0.7, 0.5, 0.8], random_state=7)
Xk = StandardScaler().fit_transform(Xk)
k_grid = range(2, 9)
rows = []
best = None
for k in k_grid:
labels_k, centroids_k, inertia_k = kmeans_np(Xk, k, n_init=10, max_iter=200, rng=np.random.default_rng(0))
dbi_k = davies_bouldin_index_np(Xk, labels_k)
rows.append({"k": int(k), "dbi": dbi_k, "inertia": inertia_k})
if best is None or dbi_k < best["dbi"]:
best = {"k": int(k), "dbi": dbi_k, "labels": labels_k, "centroids": centroids_k, "inertia": inertia_k}
df_k = pd.DataFrame(rows)
fig = px.line(
df_k,
x="k",
y="dbi",
markers=True,
title="Pick k by minimizing DBI (k-means, from scratch)",
)
fig.update_layout(xaxis=dict(dtick=1), xaxis_title="k", yaxis_title="Davies–Bouldin index")
fig.add_vline(x=best["k"], line_dash="dash", line_color="black")
fig.show()
df_k.sort_values("dbi")
| k | dbi | inertia | |
|---|---|---|---|
| 2 | 4 | 0.183769 | 21.285507 |
| 1 | 3 | 0.292989 | 167.162616 |
| 3 | 5 | 0.564168 | 18.386764 |
| 5 | 7 | 0.746407 | 14.345365 |
| 4 | 6 | 0.776640 | 16.098401 |
| 0 | 2 | 0.782764 | 731.474743 |
| 6 | 8 | 0.907931 | 12.673546 |
# Visualize: a "wrong" k vs the DBI-selected k
k_wrong = 2
labels_wrong, centroids_wrong, _ = kmeans_np(Xk, k_wrong, n_init=10, rng=np.random.default_rng(1))
dbi_wrong = davies_bouldin_index_np(Xk, labels_wrong)
labels_best = best["labels"]
dbi_best = best["dbi"]
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(f"k={k_wrong} (DBI={dbi_wrong:.3f})", f"k={best['k']} (DBI={dbi_best:.3f})"),
)
for cl in np.unique(labels_wrong):
m = labels_wrong == cl
fig.add_trace(go.Scatter(x=Xk[m, 0], y=Xk[m, 1], mode="markers", marker=dict(size=4), name=f"cluster {cl}"), row=1, col=1)
for cl in np.unique(labels_best):
m = labels_best == cl
fig.add_trace(
go.Scatter(x=Xk[m, 0], y=Xk[m, 1], mode="markers", marker=dict(size=4), name=f"cluster {cl}", showlegend=False),
row=1,
col=2,
)
fig.update_xaxes(title_text="feature 1 (standardized)", row=1, col=1)
fig.update_xaxes(title_text="feature 1 (standardized)", row=1, col=2)
fig.update_yaxes(title_text="feature 2 (standardized)", row=1, col=1)
fig.update_yaxes(title_text="feature 2 (standardized)", row=1, col=2)
fig.update_layout(title="DBI as a model-selection criterion")
fig.show()
9) Practical usage (sklearn)#
In practice, you usually compute DBI on the same representation you used to cluster:
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import davies_bouldin_score
X_std = StandardScaler().fit_transform(X)
labels = KMeans(n_clusters=4, n_init=10, random_state=0).fit_predict(X_std)
score = davies_bouldin_score(X_std, labels) # lower is better
A quick pattern for selecting \(k\) is to scan a small grid and pick the minimum.
k_grid = range(2, 9)
rows = []
X_std = StandardScaler().fit_transform(X)
for k in k_grid:
labels_k = KMeans(n_clusters=k, n_init=10, random_state=0).fit_predict(X_std)
rows.append({"k": int(k), "dbi": sk_davies_bouldin_score(X_std, labels_k)})
df_scan = pd.DataFrame(rows)
fig = px.line(df_scan, x="k", y="dbi", markers=True, title="sklearn: scan k and pick the minimum DBI")
fig.update_layout(xaxis=dict(dtick=1), xaxis_title="k", yaxis_title="Davies–Bouldin index")
fig.show()
df_scan.sort_values("dbi")
| k | dbi | |
|---|---|---|
| 1 | 3 | 0.390869 |
| 2 | 4 | 0.635738 |
| 0 | 2 | 0.770866 |
| 3 | 5 | 0.776498 |
| 4 | 6 | 0.890938 |
| 5 | 7 | 0.950224 |
| 6 | 8 | 0.968447 |
10) Pros, cons, and common pitfalls#
Pros#
No labels needed: internal metric for unsupervised learning.
Fast: \(O(n\,d)\) to compute scatters plus \(O(k^2 d)\) for centroid distances.
Interpretable pieces: explicitly trades off within-cluster scatter vs centroid separation.
Good fit for centroid-based clustering: especially \(k\)-means-like settings.
Cons#
Centroid assumption: can be misleading for non-convex / manifold clusters (e.g., moons) where “centroid distance” is not the right notion of separation.
Scale-sensitive: feature units strongly affect Euclidean distances (standardization is often required).
Not a supervised metric: DBI needs cluster labels; it’s not meaningful for regression/classification objectives directly.
Can favor too many clusters: increasing \(k\) often reduces within-cluster scatter, which can sometimes push DBI down even when clusters become less meaningful.
Common pitfalls / diagnostics#
DBI is undefined for \(k=1\).
If two clusters have identical centroids (\(M_{ij}=0\)), the ratio can blow up.
Always compare DBI within a fixed preprocessing + distance setup (don’t mix scaled/unscaled).
Where it tends to work well#
Picking \(k\) for \(k\)-means on roughly spherical blobs.
Fast internal validation when you want an alternative to silhouette score.
Exercises#
Implement a variant where scatter uses RMS distance: \(S_i = \sqrt{\frac{1}{|C_i|}\sum_{x\in C_i}\lVert x-\mu_i\rVert_2^2}\). Compare it to scikit-learn’s definition on random datasets.
Construct a dataset where DBI keeps decreasing as \(k\) increases (over-segmentation). What does the geometry look like?
Compare DBI and silhouette score on:
spherical blobs
two moons (non-convex)
Which metric aligns better with human intuition in each case?
References#
Original paper: Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure.
scikit-learn API: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html
scikit-learn model evaluation guide: https://scikit-learn.org/stable/modules/model_evaluation.html